10 research outputs found

    Policy iteration algorithm for zero-sum stochastic games with mean payoff

    Get PDF
    We give a policy iteration algorithm to solve zero-sum stochastic games with finite state and action spaces and perfect information, when the value is defined in terms of the mean payoff per turn. This algorithm does not require any irreducibility assumption on the Markov chains determined by the strategies of the players. It is based on a discrete nonlinear analogue of the notion of reduction of a super-harmonic function

    Algorithmes d'itération sur les politiques pour les applications monotones contractantes

    No full text
    PARIS-MINES ParisTech (751062310) / SudocSudocFranceF

    Solving multichain stochastic games with mean payoff by policy iteration

    No full text
    International audienceZero-sum stochastic games with finite state and action spaces, perfect information, and mean payoff criteria arise in particular from the monotone discretization of mean-payoff pursuit-evasion deterministic differential games. In that case no irreducibility assumption on the Markov chains associated to strategies are satisfied (multichain games). The value of such a game can be characterized by a system of nonlinear equations, involving the mean payoff vector and an auxiliary vector (relative value or bias). Cochet-Terrasson and Gaubert proposed in (C. R. Math. Acad. Sci. Paris, 2006) a policy iteration algorithm relying on a notion of nonlinear spectral projection (Akian and Gaubert, Nonlinear Analysis TMA, 2003), which allows one to avoid cycling in degenerate iterations. We give here a complete presentation of the algorithm, with details of implementation in particular of the nonlinear projection. This has led to the software PIGAMES and allowed us to present numerical results on pursuit-evasion games

    Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information

    No full text
    Preprint arXiv:1208.0446, 34pagesWe consider zero-sum stochastic games with finite state and action spaces, perfect information, mean payoff criteria, without any irreducibility assumption on the Markov chains associated to strategies (multichain games). The value of such a game can be characterized by a system of nonlinear equations, involving the mean payoff vector and an auxiliary vector (relative value or bias). We develop here a policy iteration algorithm for zero-sum stochastic games with mean payoff, following an idea of two of the authors (Cochet-Terrasson and Gaubert, C. R. Math. Acad. Sci. Paris, 2006). The algorithm relies on a notion of nonlinear spectral projection (Akian and Gaubert, Nonlinear Analysis TMA, 2003), which is analogous to the notion of reduction of super-harmonic functions in linear potential theory. To avoid cycling, at each degenerate iteration (in which the mean payoff vector is not improved), the new relative value is obtained by reducing the earlier one. We show that the sequence of values and relative values satisfies a lexicographical monotonicity property, which implies that the algorithm does terminate. We illustrate the algorithm by a mean-payoff version of Richman games (stochastic tug-of-war or discrete infinity Laplacian type equation), in which degenerate iterations are frequent. We report numerical experiments on large scale instances, arising from the latter games, as well as from monotone discretizations of a mean-payoff pursuit-evasion deterministic differential game

    Numerical computation of spectral elements in max-plus algebra

    No full text
    We describe the specialization to max-plus algebra of Howard’s policy improvement scheme, which yields an algorithm to compute the solutions of spectral problems in the max-plus semiring. Experimentally, the algorithm shows a remarkable (almost linear) average execution time

    Dynamics of Min-Max Functions

    No full text
    Functions F : R n ! R n which are nonexpansive in the ` 1 norm and homogeneous, F i (x 1 + h; \Delta \Delta \Delta ; xn + h) = F i (x 1 ; \Delta \Delta \Delta ; xn ) + h, (so-called topical functions) have appeared recently in the work of several authors. They include (after suitable transformation) nonnegative matrices, Leontieff substitution systems, Bellman operators of games and of Markov decisions processes, examples arising from discrete event systems (digital circuits, computer networks, etc) and the min-max functions studied in this paper. Any topical function F can be approximated by min-max functions in a way which preserves some of the dynamics of F . We attempt, therefore, to clarify the dynamics of min-max functions, with a view to developing a generalised Perron-Frobenius theory for topical functions. Our main concern is with the existence of generalised fixed points, where F (x 1 ; \Delta \Delta \Delta ; xn ) = (x 1 +h; \Delta \Delta \Delta ; xn +h), which correspon..
    corecore